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SYSTEMS ANP METHODS FOR ANALYSIS OF GENETIC NETWORKS 

This appUcalion claims the benefit under 35 U.S.C. § 1 19(e) of provisional 
application number 60/105,075, filed on October 21, 1998, which is her^ incorporated by 
5 reference in its entirety. 

1. INTRODUCTION 

The presttit invention relates to experimental and algorithmic methods for analysis of 
genetic regulatory networks. More particularly, the present invention p^o^«des methods of 
10 partitioning genes withing an organism into a phiraHty of groups and identification end 
charactenzation of condations between genes and groups of genes in the genomic networks 
of real ^nru ses, cells, and tissues. 

2. BACKGROUND OF THE INVENTION 

15 Living systems, including viruses, prokaryotes, and cukaryotcs, possess genetic 

systems composed of structural genes and ds acting genes that serve to regulate the genetic 
activities of nearby structural genes, trans acting genes and trans acting factors, namely genes 
whose RNA or protein or other product, or other factors, bind singjiy or in complexes to ds 
acting dtes to modulate the genetic activity of nearby structural genes. In addition, 

20 eukaryotic genomes contain exons and introns. Gene reguladon involves transcription into 
RNA, in eukaryotes called heterogeneous nuclear KNA or hhRNA, In eukaroytes, the 
hnRNA is processed in a variety of ways to create mature messenger RNA ,or mRNA, that is 
transported to the cytoplasm. In both prokaryotes and eukaryotes, mRNA in the cytoplasm 
is translated into proteins. Those proteins may be subjected to post translational 

25 modifications including cleavage, phosphorylation, and so forth, that modulate their 
biological activities. 

The number of structural genes ranges from a few dozen in viruses to a few hundred 
to a few thousand in bacteria, to perhaps 15,000 in Drosophila, to 20,000 in some plants, to 
an esthnated 80,000 to 1 00,000 in human ceDs. 

30 Since the work of F. Jacob and L Monod in 196 1 and 1963, it has been dear that 

genes, via their products can increase or decrease the activity of other genes or turn other 
genes "on" or **ofit" A gene is said to have been turned on if the activity of the gene 
increases from a lower level to a higher level A gene is said to have been turned off if the 
activity of that gene decreases from a lugher level to a lower or minimal level. Cells, in 

35 short, have a vast, parallel processing molecular genetic regulatory network of the kinds of 
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genes and their products rioted above, whose joint dynamical, activity within and between 
ceUs underlies both normal ontogeny and much of pathogenesis, ranging from viral infections 
to cancer to metaplasias, to tissue degeneration and regeneration. 

Underetandmg the coordinated behavior of the genetic regulatory networic in cells 
5 has emerged as among the most important problems in molecular, cell, and developmental 
biology as well as in biomedicine. It is almost c^ainly true that a "post-genomic** medicine 
will be one that leams to manipulate patterns of gene network activities within and between 
cells to treat or prevent disease. 
15 Genetic regulatory networks (GKN) model systems of interdependent variables 

1 0 which change over time. A GRN comprises a plurality of variables, a system state defined as . 
the value of the plurality of variables, and a phirality of regulatory rules corresponding to the 
plurality of variables which determine the next system state firora previous system states. 
20 GRNs can model a wide variety of real world systems such as the interacting components of 

a company, the conflicts between different members of an economy and the interaction and 
]S expression of genes in c^ and organisms. 

Genes contain the information for ccmstiucting and maintaining the molecular 
25 components of a Hving organism. Genes directly encode the proteins which make up cells 

and synthesize an other building blocks and signaling molecules necessary for life. During 
development, the unfolding of a genetic program controls the proliferation and differentiation 
20 of cells into tissues. Since the fimction of a protein depends on its structure, and hence on its 
amino add sequence and the corrcspondmg gene sequence, the pattern of gene expression 
determines cell fimction and hence the cell's system state and the rules by which the state is 
changed. 

In a GRN representing the interaction and expression of genes in cells and single-cell 
25 organisms, the variables represent the activation states of the genes. For example, the level 
of activity of a gene can be measured by the number of messenger RNA (mRNA) transcripts 
of the gene made per unit time or the number of proteins translated from the mRNAs per 
unit time. The regulatory rules are deterrmned by the transcription regulatory sites next to 
each gene and the imeractions between the gene products and these sites. Binding of 
^ 30 molecules to these sites in various combinations and concentrations determmes the degree of 

expression of the corresponding gene. Since these molecules are proteins or KNA's made by 
other genes, the network mles are functions of the activation states of the genes which they 
control Genes are constantly exposed to varying concentrations of these controlling 
^ substances, so such a system can be considered as a GKK v^th an asynchronous, continuous 

35 time update rule. 
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As is wen-known for afl types of dynamical systems, tbcse networks demonstrate 
Bttractors and basins of attraction. See Stephen E Harris, Bruce K. Sawhill, Andrew 
Wuensche. and Stuart A. Kauffinan, Biased eukaryotic gene reguJaUm rules suggest 
genome behavior is mar edge qfchaos^ Technical Report 97-05-039. Santa Fe Institute, 

5 1997 (Harris et al.); Roland Somogyi and Caxol Ann Sniegoski, Modeling the complexity oj 
genetic netyrorks: Understanding muUigenic andphiotrcpic regulation. Complexity, 
l(6):45-63. 5996; Staurt Kauffinan, The Origins of Order, Oxford Univeraty Press. New 
York, 1993 {Origins of Order). An attractor is a state or set of stales to which the system 
moves and then remains within for all future generations. Thus, an attractor is a recurrent 

10 pattern of states of system variables that typically occupies a sub-volume of the space 
containing all possible states of system variables. A baan of attraction is the set of slates 
that eventuaDy lead to a given attractor. In general, a system of N variables can have 
between 1 and 2*'attractors with basins ranging in size from the entire space of possible 
states of system variables to individual states. 

15 In a non-limiting interpretation that guides some of the procedures of the present 

invention, different cell types of an organism are interpreted to correspond to different 
attractors in dynamic genetic network of the genes, cells, and cell types cells of that 
organism. 

Identifying the GRK representing the interaction and expression of genes in a class of 
20 cells is offimdamental importance fbr medical diagnostic and thcrapoitic purposes. For 
example, normal and cancerous cells may have identical surfece markers and surface 
receptors and can be difficuh to distinguish with chemotherapcutic agents. A CRN model of 
the interaction and expresnon of genes in the cells can indicate functional differences 
between normal and cancerous cells thai provide a basis for differentiation not dependent on 
25 cell surface markers. The GRN also provides a means to identify the receptors or genetic 
targets to which molecule design techniques such as combinatorial chemistry and high 
throughput screening shotild be directed to achieve given fimctional effects. Such techniques 
arc frequently used now, and pharmaceutical and biotechnology companies suffer from 
uncertainty as to which targets and receptors are worthy of study. The approach described 
30 below can gready assist in this process. See Gene Regulation and the Origin of Cancer: A 
New Method, A Shah, Medical Hypothecs (1995) 45,398-402 and Cancer progression: 
The Ultimate Challenge, Renato Dubbecco. Int. J. Cancer Supplement 4, 6-9 (1989). 
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3. SUMMARY O P THE INVENTION 

The cuireni invention lays out a set of comprehensive procedores, experimental and 
algorithmic, for the analysis of real genetic regulatory networks of biological systems. 

5 These novel procedures arc based, in part, on published studies of model genetic networlcs 
as described above and reviewed in Origins of Order, by Kauffinan and the Kau£6nan 
Ballivet U.S. paiait plication No. 09/165,794 filed October 2 1998, by inventors 
Kauf&nan and Ballivet, each of which is incorporated herein by reference in its entirety. 
A further aim of the present invention is to provide experimental and algorithmic 

1 0 means to identify isolated green islands in the genomic networics of real viruses, cells, and 
tissues. 

The present invention provides a me thod for partitioning a plurality of genes into 
one or more groups comprising the steps of: selecting a first one of said genes and a second 
one of said genes; measuring a degree of conclation between said first gene and said 
1 5 second gene; and assigning said first gene and said second gene into a same one of said 
grotq)S if said degree of correlation exceeds a predetermined threshold. 

R is an aspect of the invention to provide a system for partitioning a plurality of 
genes into one or more groups comprising: 

a programmed computer comprising a memory having at least one region storing 
JO con^juter executable program code and a processor for executing the program code stored 
in said memory, wherein the program code includes: 

code to select a first one of said genes and a second one of said genes; 
code to measure a degree of correlation between said first gene and said 
second gcn^ and 

5 code to assign said first gene and said second gene into z same one of said 

groups if said degree of conelation exceeds a predetermined threshold. 

rt is an aspect of the invention to provide a method for partitioning a plurality of 
genes into one or more groups con^piising the steps ofi 
defining a state for each of said genes; 
^ selecting at least one of said genes; 

initiating a perturbation on said selected gene to change said state of said 

selected gene; 

identifying zero or more of said genes that experience a change in said state 
in response to said perturbation. 
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Methods for pcrtuiting the physiolopcal state of biological samples to effect 
different gene acavation states are described in detail below. Furthennore. ihethods for 
detecting gene expresaon of a plurality of genes in the biological samples^ including 
expresaon levels resulting from such perturbations, arc described in detail below. The 

5 information obtained from measuring, quantitatively or qualitatively, the level of expression 
of a phirality of genes in a biological sample can be used in accordance with the methods 
disclosed herein to identify and characterize genetic regulatory networics. 

The identification and characterization of such genetic regulatory networks provides 
a characteristic "snapshot" of the physiological state of the biological sample. The 

10 information that constitutes these snapshots mdudc characterization of the expression level 
of a phirality of genes that constitute a geneUc regulatory network, or a sub-networic within 
a given genetic regulatory network. These snapshots, therefore, are useful in deagning 
approaches for identifying disease states and designing approaches for disease intervention. 
Therefore, the methods described herein are usefiil in disease diagnosis, identifying targets 

1 5 for therapeutic intervention, and monitoring the progress of therapeutic treatments. More 
particulariy, the characteristic gene activation stale of a diseased biological sample can be 
compared to that of a normal sample to pro^dde a ready indicator of disease. Moreover, 
individual genes that arc expressed at a different rate in a disease state as compared with a 
normal state are candidates for therapeutic modalities that aher tbdr expression to 

20 approximate the expression Icvd of the normal state. In addition, the progress of treatment 
rcghneos can be monitored by examining the gene activation state of biological sample of a 
subject at different stages of treatment. The effectiveness of the treatment is indicated by a 
progresaon of the gene expression pattern from the disease state to the normal state. Thus, 
treatment regimens can be optimized by correlating the treatmem with a change In 

25 expression pattern that approaches that of the nomial state. 

4. BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 is a plot of the expected distance between two states at time T+l as a function 
of the nojinalized distance between the two states a moment eariier. 
30 no. 2 is a plot of the log of the number of avalanches versus the log of the size of 

the avalanche produced by reversmg the aaivity of a single randomly chosen gene within a 
modd genetic network. 

FIG. 3 is a histogram of the number of times a given mutual information was 
observed for pairs of genes within the same green islands of a model genetic network. 

35 
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FIG. 4 is a histogram of the number of times a gjvov mutual infonnation tvas 
observed for pair3 of genes within differcirt green islands of a modd genetic network. 

FIG. S discloses a representative computer system in conjunction with which the 
embodiments of the present invention may be implemented. 

5 

5. pETATLED DESrRTPTTON Of J^^ TN VENTlON 

The present invention presents, without limitation, a number of experimental and 
algorithmic methods to establish whether eulcaryotic ccUs are in the ordered regime, v^ethcr 
isolated green islands exist, to determine which genes are meniers of wtech isohUed green 
10 islands, which genes are members of the red froxcn structure, the regulatory connections and 
rules among the genes within each isolated green islands and the network more generally. 

5.1 Numerical Models of Genctjc Networks 

A broad area of mathematical and algorithmic work has been carried out in which 
15 genes arc modeled dthcr as bmary variables, cg^-on-off' devices; as "pieccwise linear' 
devices, or a$ "continuous ©gmoidal" devices. See. Origins of Order, incorporated herein by 
reference. Broadly, the same results are obtained in all cases. Some of the results of these 
simulations are listed below and constitute some of the conceptual background for the 
present invention. 

20 I) Parallel procesnng networks of thousands of genes and their products behave 

in two broad regimes: one, an ordered Regime or, two, a Chaotic legime. 

2) A rough phase tranation, often defined as the "edge of chaos", separates 
these two regimes in the appropriate network parameter spaces. 

3) Whether ordered or chaotic aO these model genetic oetworics are parallel 
25 processing oon-lmear dynamical systems. For deterministic models in all these classes, the 

generic dyiiamical behavior of a typical network breaks up the state space of possible 
combination of gene, RNA, and protein activities into one or more "basins of attraction". 
VTithin each basin of atvaction. the vector field, or transistions between disacte states, 
rcprcsending the dynamics of the system, yields trBjectorics that flow, rather like creeks 
30 flowing to a mountain take, into a recurrent subset of the state space called an "attractor". In 
cominuDUS systems, the attraaor might be a steady state, a Ihnit cycle, a cjuasiperiodic ortnt, 
or a chaotic "strange attractor". In discrete determim^ synchronous state spaces the 
attractor is typically a "state c^le" around which the system orbits. The length, or number 
of states, on the state cycle can range irom 1 to aD the possible states. 
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10 



15 



If the network has more than one basin of attraction, then the system vnoII flow to the 
attractor that "drains" the basin of attraction in wWch the network was initiated. The set of 
alternative attractors r^rcscnt the aliematiYe asymptotic behaviors of the network. 

A variety of characierisUcs distinguish the ordered from the chaotic regime. Sec, 
5 Oripns of Order, incorporated hwein by reference. Briefly, in the ordered regime, the 
network gencricatly flows from ai^ initial state to an attraaor. Initially, genes may turn on 
and ofi; e.g. increase or decrease in activity, e.g. exprcsaon levels, as may levels of their 
KNA and protdn produces, in complex temporal patterns. But. for binary synchronous 
networks, as the aitractor is approached, the activities of more and more genes and their 
10 produas become fixed in on or fixed in oflf values. For the piecewise linear or continuous 
agmoidal hetworks. more and more genes and their products become essentially "fixed" at 
near miniroal or near maximal activities. It is convenient to designate these fixed genes. 
20 variables, or products as "red". Then in the ordered regime, typically, a red connected 

duster of genes and products percolates or extends across the network. The technical 
1 5 definitions of percolates include the concepts that the size of the red "frozen** sea of fixed 
genes scales up m size with the size of the genetic network, and that in any such network 
25 there are connected regulatory pathways among the fixed genes along which all the genes on 

the pathway are fixed on or oflF. In the ordered regfane, this "core" of ihe red frozen 
structure is the same for aU the difftrent attractors of the entire network. Thus, for example. 
20 if the genetic network has 200 different attractors, e.g. ceU types, the red frozen core would 
30 be in substantially the same fixed state of activity, whh perhaps small modulations, in afl 200 

attractors. 

In the ordered regime, once the red frozen stmcture forms, isolated islands or groups 
of variables or genes and their products remain that may cither turn on and off in complex 
25 temporal patterns, ami/or may have two or more alternative steady activity slates, e.g., levels 
of expression. It is convenient to dcagnate these genes and products, whose activities can 
vary within and eqpocially bctwe«i attractors, as "green". For example, genes having an 
expression level that varies between different cell types of the organism may be designated as 
green. 

30 The genes within any one green island can form simple or complex regulatory sub- 

networks of the entire genetic network. For example, the expression levels of genes vwthin a 
given subnetwork. c.g. green island. m«y exhibit one or more attractors, e.g., recurrent states 
of expression level. Additionally, expression levels of genes within a given green island may 
exhibit correlated behavior. However, the cxpresaon levels of genes and gene produas 
3 5 vnthin different green islands are fiinctionally isolated from one another in the sense that 
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allerations in the expression levels or activities of variables, or genes, gene products in one 
island cannot perturb the cxpresaon levels of genes or activities of variables or genes within 
other isolated green islands. That is, variables within different green isUnds exhibit generally 
uncorrdated behavior. 

5 In the ordered regime, nearby states in state space tend, on average, to He on 

trajectories that converge closer to one another in state space. That is, if homeostasis is 
defined as "return after perturbation'', in the ordered regime, the dynamics are homcostatic. 
Specific algorithmic measures are known in the art, such as the *T>crrida Curve" to 
characterire whether the dynanucs of a model or real system show convergent flow in state 

10 space. See, The Origras of Order, incorporated herein by reference. Briefly, the two copies 
of the system are "initiated* at pairs of states at difi'erent initial "distances". For binary 
networks, a conveniem measure of the distance between two binary states is the fraction of 
binary variables by which they differ, called the normalized Hamming distance, R Thus, for 
example^ (1111111111) and (01 11111 HO) overlap in 8 of ten positions and differ by 2 so 

1 5 that the normalized Hammmg distance is 0.2. For continuous variables, a generalized 
continuous euclidian metric is convenient to me&surc the distance between to continuous 
vectors of gene and product activities. 

In nimicrical studies described in Origins of Order, each copy of the network is 
allowed to undergo a short time evolution, correspondii^ to a sing^Ie state transition m a 

20 synchronous Boolean network, or a short interval corresponding to the time scale for genes 
to change activations states or to turn on and off in that organism. The result of this time 
evolution is that each copy arrives at a successor state from its initial state. The distances 
between the successor states are measured. The final distance between states or DCT+1) 
may be less than, equal to, or greater than the mitial distance, D(T) between the states. In 

25 the ordered regime, states at all initial distances, on average, tend to lie on trajectories that 
converge. This is revealed by the fact that, for such systems, for alt pairs of initial states, at 
difierent initial distances, on average, D(T-?-l) is less than D(T). 

Figure 1 shows a Cartesian coordinate system \wth D(t) plotted on the X axis, and 
D(T-H) plotted on the Y axis, the resulting "Derrida curve" averaged for many pairs of initial 

30 states at each inhial distance, characterizes whether the system is in the ordered regime or 
the chaotic r^ime. The Perrida curve is a recurrence relation showing the expeaed 
distance I>t+1 between two sutcs at time T+1 as a fimction of the normalized distance, Dt 
between two sutes a moment earlier. The main diagonal. Dt+Dt+1 shows the conditions 
imder which two initial states lie on trajcrtories that neither diverge nor converge in state 

35 space. For values of K, the number of inputs per gene, greater than 2, the Derrida curve lies 
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above the main diagonal for snudl tnhia] distances between initial states. Dt. This 
coircsponds to the first step in an expanding avalanche of damage and is a signature of 
chaotic behavior and sensitivity to initial conditions. For K=2 or less, the Deirida curve is 
below the main diagonal for all initial distances, Dt. coiresponding to convergence in state 

5 space. K'=2 is the phase transition to chaos. 

In the ordered regime, the plot is everywhere below the "main diagonal* where 
D(T+1) " IHt). In the chaotic regime, for some initial distances, typicall small imtial 
distances, states tend to diverge. This is "the butterfly efifeot", or sensitivity to initial 
conditions. Here, in the chaotic re^c, D(T+L) is greater than Dix) for these initial 

10 distances. 

A fimhcr characteristic of the ordered regime concerns the propagation of "damage* 
in such networks following perturbation of one or more variables of the network as 
evidenced by numerical simulations described in Origins of Order by Kauffimn. above. 
Consider two identical copies of a network of variables. Alter or perturb the activity value 

15 of a single variable, e.g. expression level of a gene, or product in one of the two network 
copies. ABow both network copies, the unperturbed, and the perturbed, to evohrc forward 
for a time sufiBcient to allow at least some of the network variables to change state. 
Determine the state or activity level of network and compare the state of the pertuited and 
unperturbed copies. A variable, gene, or product within the perturbed copy may be defined 

20 as "damaged" if state or level of activity of the variable is ever different, one or more times, 
from the state or level of activity of the corresponding variable in the utQJCTturbcd copy. The 
steps of allo'mng each network to evolve one or more steps and detennining and comparing 
the level of activity of the networks may be carried out repeatedly. Given this definition, a 
site can be difiEcrcnt in its activities firora the unpcruirhed site many times, but it is only 

25 damaged once and remains damaged thereafter. 

Further, given this definition of damage, one can define the size of an avalanche of 
damaged variables, genes, products or "sites" following the initial perturbation to a single 
gene, gene product, or site. Gcnerically, for a networic in the ordered regime, the size 
distribution of damage avalanches is a power law distribution with many small avalanches 

30 and few large ones, due to many random choices of which gene at which network state to 
perturb. To iUustrate the power law. the logarithm of the size of the avalanche is plotted on 
a X axis and the number of avalanches at that size is plotted on a Y axis of a Cartesian 
coordinate system. A power law produces a straight line with a negative slope. 

Figure 2 shows that a simulated binary network in the ordered regime has a power 

35 law distribution of avalanches of changes in gene activities produced by reversing the 
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activation state of a single randomly chosen gene. The simulated network had N=65,000 on 
or off genes» wfech is about equal to the number of genes in a human cell. The distribution 
shows a finite cutoff wth maximal avalanche txzc about equal to 2 or 3 times the square root 
of the number of genes in the system. Thus, in the ordered regime very near the phase 
5 tran^tion to chaos, the power law size distribution has a roaadmum size avalanche that scales 
as a rough square root fimction of the number of genes. Deeper in the ordered Teglme» as 
measured, for example, by the demda curve lying further below the main diagonal, the 
distribution of avalanches is a similar, slightly steeper power law. with a smaller maximum 
size avalanche. 

1 0 In the chaotic regime, the size distribution of avalanches shows a similar power law 

distribution of relatively small avalanches, and a "spike" of huge avalanches that nuiy involve 
between 20% to 50% or more of the genes, the center of the spike stefting to higher 
fractions as the ncuvork is deeper m the chaotic regime as measured by the how much of the 
derrida curve lies above the main diagonal. 

15 In the chaotic regime, rather than there being isolated green islands of genes and 

products whose activities can vary within, or more nnportant3y between attractors, there is 
instead a vast percolating ^green sea" of connected genes and products all of which can vary 
in activity within or between attractors. In the chaotic regime there may be one or more 
isolated red frozen islands. 

20 In the ordered regime, if damage is initiated by stimulating, inhibiting or othervnse 

perturbing a gene or gene product in an isolated green island, the propagating avalanche of 
damage is entirely, or almost entirely confined to that green island. This reflects the hex that 
alterations cannot propagate across the fixed percolating red frozen structure that isolates 
the green islands firom one another. In contrast, in the chaotic regime, perturbation of an 

25 iiUtial gene or product may unleash an avalanche that spreads to a finite firactlon of the other 
genes or products, corresponding to the huge avalanches seen in the chaotic regime. Here, 
"finite** means that the size of the largest avalanche scales linearly with the size of the 
network. 

The size distribution of isolated green islands themselves is a power law, with more 
30 smaU islands than large ones. The average size of an isolated green island scales 
logarithmically whh the size of the network. 

In the ordered regime, a fimdamemal feature of the dynamics of the simulated 
networks Is that when the network settles to an attractor, each green isolated island is itself 
as a sub-network of the entire network, on an attractor. Each Isolated green island may be 
35 capable of one or more dififcrcnt attemative attractors. For example, in the piecewise linear 
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case deep in the ordered regime, these dififercut attractors correspond to different steady 
Slate expression levels of the genes and thwr products. Thus, importantly, the set of all the 
different attractors of the entire network correspond to the''firozen red sea" in the same fixed 
state on all attractors, and the green islands in their different attractors. Thxis, the behavior 

5 of the whole network in the ordered regime is fundamentally "combinatorial*. For eicampte, 
let a network have three isolated green islands, A, B, and C. Let A have two alternative 
attractors. B have three alternative attractors, and C have four alternative attractors. 
perhaps, for example, each attractor corresponds to a different steady state level of activities 
of genes and products in the corresponding green isolated island. Then the total number of 

1 0 attractors of the entire network is the product of the number of alternatives in the three 
isolated green islands. 2 x 3 x 4« 24. The altcraalive "choices" made by the different 
attraaors consiiitute a kind of "epigenelic code" that specifies the attrartor of the entire 
network. Given the identification of an attractor of the entire networic and a cell type of an 
OTganism, this epigenetic code word then specifics the cell type in question. See, Origins of 

15 Order incorporated herein by ro^ence. 

The behavior of binary networks, piece^se linear networks, and sigmoidal networks 
are similar, except that the Utter two continuous networks tend to exhibit "green islands" in 
which genes and products are at differcm steady state levels of activities on the different 
attractors of each of those islands, whereas in the binary synchronous case, the activation 

20 states of genes within one Island generally decrease or increase in complex patterns on the 
state cycle attractors of each island. 

Numerical investigation of Boolean and pieccwise linear oetworlcs have revealed the 
homologous ordered and chaotic regimes and the same scaling law for the phase transition 
between order and chaos as two parametersi, P, characterizing a spedfic bias in the response 

25 fimction, and the number of inputs per variable or gene, K, are tuned. Sec, Origins of Order, 
and Glass and Hill incorporated herein by reference. More recently. Hill and 
colleagues have shown the same phase transition between order and chaos in two parameter 
plane corresponding to biases towards canalyzing funcdons on one axis and the number of 
inputs per gene, K, on the other axis. A canatyzing fimction may be defined as any Boolean 

30 function having the property that it has at least one input having at least one value (1 or 0) 
which suffices to guarantee that the output of a variable or dement r^ulated by the fimction 
assumes a specific value (1 or 0). See Origins of Order, incorporated herein by reference, 
for a more complete discusaon of canalyzing fiinaions. A logical "and'* is such a fimction 
because if cither the first or second inpm is 0, the regulated output is guaranteed to be 0. By 

35 
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contrast, a lo^cal "exclusive or" is not a canalyjing function because not angle state of 
either input guarantees that the behavior of the output. 

Recent results vciy strongly suggest that euJcaryotic genes are biased to regulation by 
canalyzing functions: This is based actual observed transcription regulation for eukaryotlc 
5 genes with K = 3,4, and S known direct regulatory mputs. 

Mathematical analysis of modd genetic networks with the observed biases towards 
the same distribution of high numbers of canalyzing functions as seen in real regulated 
cukaiyotic genes firmly indicates by derrida curves^ the presence of percolating red jBrozen 
structures and power taw distributions of damage avalanches, that real eukaiyotic cells He in 
10 the ordered regime. In short, mathematical and numerical work known in the art suggests 
the same broad behavior regimes in binary and piecewise linear, and, with less data, 
sigmoidal, model genetic networks. 

Thus, it is very likely that real cells, cukaiyotic and prokarytic. lie in the ordered 
regime with isolated green islands whose alternative attractors are ccmral to cell 
15 differentiation and may constitute an epigenetic code. 

5.2 Measurement of Gene Acttvitv 

Wherein, the tenn activation state of a gene refers to the level of gene activity, le. 
the level of expr^on of the gene. The products of gene expression are transcripts (e.g. 
20 hnRNA. or mKNA) and translation products fi.e. proteins). Thus, the level of expression of a 
gene in a biological sample can be characterized, e.g. measured, qualitatively or 
quantitath^dy, or both, by detecting the abundance of transcripts or protein products of that 
gene present in the biological sample. 

The ceil, set of cells, or tissue sample may correspond to cells or tissue obtained in 
25 vivo from an organism or to cells or tissue obtmned or grown in vitro. 

In geneial, the methods of the invention comprise measuring the activation state or 
level of expression of one or more genes within one or more biological samples. 

Methods for measuring transcripts (e.g. hnRNA or mRNA) and protein products are 
well known in the art. For example, and not by way of limitation, a parallel method for 
30 measuring the level of gene activity withm a sample inchides the use of nucleotide arrays 
Such as those v^ich are commercially available from Aflfymetrix Incorporated, as described 
m U.S. Patent No. 5,837,832, which is herdiy incorporated herein by reference in Its 
entirety. Using such arrays, transcript e)q)ression from a plurality of genes can be detected 
and quantified in parallel over a wide range of exprcsaon levels to allow conqjarison of 
35 expresnonlovds for a plurality ofdiffcrcm genes within a biological sample. The relative 
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expression of many genes can be simultaneoudy determined. Changes in the level of 
expression over time or between samples may also be measured qualitatively or 
quantitatively or both. 

In an alternative approach. SAGE anaJy^s (serial analysis of gene expression) cr 
5 similar analyses may also be used to characterize the activation state of a gene. SAGE, as 
described in U.S. Patem No. 5,695,937 , incoTporalcd herein by reference in its entirety, 
provides a method for the rapid analysis of numerous Uanscripts in order to identify the 
o verall pattern of gene e^qiresson in difi^em ceH types or in the same cell type under 
differcm physiologic, developmental or disease conditions. The method is based on the 
10 identification of a short nucleotide sequence tag at a defined position in a messenger RNA. 
The tag is used to identify the corresponding transcript and gene from which it was 
transcribed. By utilizing dimcrized tags, termed a "ditag", SAGE allows elimination of 
certain types of iMas which might occur during cloning and/or amplification and pos^ly 
during data evahiation. Concatenation of these short nucleotide sequence tags allows the 
15 efficient analysts of transcripts m a serial manner by sequencing muhiple tags on a single 
DNA molecule, for example, a DNA molecule inserted in a vector or in a ^gle clone. Each 
technique for characterizing the level of gene transcription activity analyzes the RNA or 
hhRNA» or mRNA content of a single ceD, a set of ceils of the same cell type, or a tissue 
sample which may have one or more cell types to reveal the relative abundances of 
20 transcripts ofthousandsofdiffcrcnt genes simultaneously. 

The level of gene expression may also be measured by characterizing the abundance 
of translation products (e.g. proicms) within a biological sample. For example, and not a 
limiting example, the abundance of protdns within a biological sample may be characterized 
by using two dimensional protein gels or similar parallel analysis methods, including, but not 
25 limited to, analyses of translation rates and phosphorylation states. 

In all embodimeiUs of the invention, the characterization (e.g. qualitative and/or 
^quanlhative measurement) of gene expression may be performed at a single point in tirac^ or 
at phirality of succeeding points in time to establish a temporal record or thne series of the 
expicssion level of a pluraEty of genes within a biologjcal sample. Thus, in a non-Iuniting 
30 example, a time series might correspond to a scries of measurements of the cxpresa'on level 
of genes within a ceD line following introduction of a hormonal or other stimulus. For 
example, a homional stmiulus may be introduced to induce differentiation of the cell line. A 
corresponding tune series of measurements of gene expression could be acquired from a 
sinalar ccH line not receiving the hormonal or other stimubs and can be compared to the 
35 corresponding time points in the treated sample. In another example, the level of gene 
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expression in a population of cells undergoing a cell cycle may ^e measured repeatedly to 
acquire a time series. 

In certain cases, it is now possible to measure and quantify the expression level of 
genes fiom single cells, such as neurons. Additionally, using methods described in the U.S. 
5 patent application No. 09/1 65,794 iiled October 2, 1 998, by inventors Kauf&nan and 
Ballivet, hereiby incoxporated by reference in ite entirety, the bound and unbound states of 
one or a plurality of cis acting sites in a single cell or set of cells may be assayed to establish 
in parallel the "ois actxvit/* state of a cell or cells. 

5.3 PertmttjLtion of Physiological State of the Biological Sample 
In accordance wi^ the invention, the physiological state of a biological sample may 
be perturbed by modulating the expression of one or more target genes, or the activity of 
one or more target gene products, in the sample. Such modulation includes inhibiting or 
enhancing the expression of the gene or the activity of the gene product, using methods well 
known in the art 

5 J.L Methods of Inhibiting £)q7ression Of A Target 
GcQO Or Activity Of A Target Gene Prodoct 

Methods of inhibiting gene expression include, but are not limited to, knocking out 
20 the gene expression by mutating the target gene such tiiat a functional gene product is not 
pro duc e d , and inhibiting the gene expression by adding a compound thai inhibits the 
expresstcm of the gene. Sudi inhibitory compounds include, but are not limited to, 
anti-sense mRNA, ribozymes, and triple helix fomiing oligonucleotides. In addition, a 
compound that down-regulatcs gene expression, such as a metabolite that binds an 
25 apo-repressor to fonn activated repressor, can be used to inhibit gene expression. 

Moreover, conqwunds that inhibit the activity of the protein product of a target gene can be 
used to inhibit the effect of that protein on a downstream target site (e.g., nucleotide 
regulatory region or another proteinX and thereby pertutb the physiological state of the 
biological sample. For example, in specific embodiments for inhibiting the activity of a 
30 receptor protein, antibodies or other ligand analogues that bind the receptor and inhibit the 
ability of the natural ligand to bind the receptor may be added to the biological sample. 

S.3.M Pertmbation Through Targeted Inhibition Of Gene Bxm^tm 
Among the co(nqK>unds which may pertuib the physiological state of a biological 
35 sample througih inhibition of the expression of a particular gene are anlisense, ribozyme, and 
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triple helix molecules. Techniques fbr the production and use of such molecules arc well 
known to those of skill in the art. 

Antisense RNA and DNA molecules act to directly block the translation of mRNA by 
hybridiang to targeted mKN A and preventing protein translation. 

5 Antisense approaches involve the design of oligonucleotides (either DNA or RNA) 

that are complementary to target gene mRNA. The antisense oligonucleotides will bind to 
the complementary target geoc mRNA transcripts and prevent iranslatioa Absolute 
conQ)lementarity, although preferred, is not retjuired. A sctjucnce "complementary to a 
portion of an RNA, as referred to hcrdn, means a sequence having sufficient 

10 complementarity to be able to hybridize with the RNA, fonrung a stable duplex; in the case 
of double-stranded antisense nucleic adds, a single strand of the duplex DNA may thus be 
tested, or triplex formation may be assayed. The ability to hybridize will depend on both the 
degree of complementarity and the length of the antisense nucldc acid. Generally, the longer 
the hybridizing nucleic acid, the more base mismatches with an RNA h may contain and still 

1 5 form a stable duplex (or triplex, as the case may be). One skilled in the art can ascertain a 
tolerable degree of mismatch by use of standard procedures to determine the melting point of 
the hybridized complex. 

Oligonucleotides that are complementary to the 5* end of the message, e.g., the 5' 
untranslated sequence up to and inchiding the AUG initiation codon, should work most 

20 efficiently at inhibiting translation. However, sequences coroplememaxy to the 3' 
untranslated sequences of mRNAs have recently shown to be effective at inhibiting 
translation of mRNAs as well. See generally. Wagner, R., 1994, Nature 372:333-335. 
Thus, oligonucleotides complementary to either the S - or 3 - non- translated, non-coding 
regions of the target gene could be used in an antisense approach to inhibit translation of 

25 endogenous target gene mRNA. Oligonucleotides complementary to the 5' untranslated 
region of the mRNA should include the complement of the AUG Stan codon. Antisense 
oligonucleotides complementary to mRNA coding regions are less efficient inhibitors of 
transbtisAon but could be used in accordance with the inventioa Whether designed to 
hybridize to the 5*-, 3*- or coding region of target gene mRNA, antisense nucleic acids should 

30 be at least six nucleotides in length, and are preferably oligonucleotides ranging from 6 to 
about 50 nucleotides in length. In specific aspects the oligonucleotide is at least 10 
micleotides, at least 17 nucleotides, at least 25 nucleotides or at least 50 nucleotides. 

Regardless of the choice of target sequence, in vitro studies can first be performed to 
quantitatc the ability of the antisense oligonucleotide to inhibit gene e?q)ression. These 

35 studies utilize controls that distinguish betvk^en antisense gene inhibition aiul nonspecific 
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bjotogical effects of oligonucleotides. These studies may also compare levels of the target 
KNA or protein with that of an internal contro! RNA or protein. Additionally, it is 
environed that results obtained using the antisensc oligonucleotide arc con:jpared with those 
obtained using a control oligonucleotide. It is preferred that the control oligonucleotide is of 

5 approximately the same length as the test oligonucleotide and that the nucleotide sequence of 
the oligonucleotide differs from the antisense sequence no more than is necessary to prevent 
specific hybridization to the target sequence. 

The oligonucleotides can be DNA or RNA or chimeric matures or derivatives or 
modified versions thereof, single-stranded or double-stranded. The oligonucleotide can be 

10 modified at the base moiety* sugar moiety, or phosphate backbone, for example^ to improve 
stability of the molecule^ hybridization, etc. The oligonucleotide may include other appended 
groups such as peptides (e.g., for targeting host cell receptors in vivo), or agents £&d]itating 
tran^ort across the cell membrane (see, e.g., Letsinger el al, 1989, Proc. Natl. Acad. Sd. 
U.S.A. 86:6553-6556; Lemaitrc ct al., 1987, Proc. Natl. Acad. Sd. 84:648-652; PCT 

15 Publication No. WO88/09810. published December 15. 1988) orthe blood-brain barrier 
(see, c.g., PCT Publication No, WO89/10134, pubUshed i^ril 25. 1988). 
hybridization-triggered dcavage agents. (See, e.g., Krol et al., 1988, BioTechniques 
6:958-976) or intercalating agents. (See. e.g., Zon, 1988. Pharra. Res. 5:539-549). To this 
end. the ofigonudeotide may be conjugated to another molecule. e.g.. a peptide, 

20 hybridization triggered cross-Hiildng agent, transport agent, hybridization-triggered cleavage 
agent, etc. 

The antisense oligonudeottde may comprise at least one modified base moiety which 
is selected finom the group including but not limited to 5-fluorouraciI, S-bromouracil, 
5-chIorouraciI, 5-iodourdDl, hypoxanthine, xantine, 4-acetylcytosine, 

25 5-(caTbo)tyhydroxylmethyl) uradl 5-cart)Oxymethylaminomethyl-2-thiouridine. 

5-cari50xymcthylaminomethyIuracil, dibydrouradl. bcU-D-galactosylqueosine, inosine. 
N6^5opentenyladenine, 1-metbylguanine, l-methylinosine, 2,2-dimethylguanine, 
2-methyIadenine, 2-mcthyiguamne, 3-nxethylcytosine. 5-methylcytosinc N6-adenine, 
7-meth>iguanitte, 5-methylaminomethyluradl. 5-methoxyaniinomethyI-2-ihiouracil. 

30 beta-D-mamtosylqueosinc. 5 -methoxycarboxym^haracil, 5-mcthoxyuradl. 
2-mcthylthio-N6-isopentenyladenine, uiBdI-5-oxyacetic add (y), wybutoxostne, 
pseiidouracil. qucostne. 2-thiocytosine, 5-methyl-2-thiouradl, ^tMouracil 4-thiouradI, 
5-mcdiyhjracil, uractt-5-oxyacctic add methylest^, uracD-5-oxyacetic add (v), 
5-incthyl-2-thiouraciI, 3-(3-amino-3-N-2-carboxypropyl) uradl, (acp3)w, and 

35 2,6-diamiaopurine. 
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The antisense oligonucleotide may also comprise at least one modified sugar moiety 
selected from the group including but not limited to arabinose. 2-fluoroarabinose, xylulose^ 
and hexose. 

In yet another embodiment, the antiscnse oligonucleotide comprises at least one 
5 modified phosphate backbone selected from the group consisting of a phosphorothioatc, a 
phosphorodithioate. a phosphoramidothioate, a phosphoramidatc^ a phosphordiamidate, a 
meUiylphosphonate, an alkyl pbosphotriesta, and a formacetal or analog thereof. 

In yet another embodiment, the antiscnse oligonucleotide is an -anomcric 
oligonucleotide. An -anoraeric oligonucleotide forms specific double-stranded hybrids with 
10 complementary RNA in which, contrary to the usual -units, the suands run parallel to each 
other (Gauticr ct al., 1987. Nucl. Adds Res. 15:6625-6641). The oligonucleotide Is a 2 
-0-methylriboiiucleotide (Inoue et al.. 1987, Nucl. Acids Kes. 15:613 1-6148), or a chimeric 
RNA-DNA analogue (Inoue ct al., 1987, FEBS Lett 215:327-330). 

Oligonucleotides of the invention may be synthesized by standard methods known in 
15 the art, e.g. by use of an automated DNA synthesizer (such as are commerdally available 
from Biosearch. Applied Biosystems, etc.). As examples, phosphorothioatc oligonucleotides 
may be synthesized by the method of Stein el al. (1988. Nucl. Adds Res. 16:3209), 
meth^phosphonate oligonucleotides can be prepared by use of controlled pore glass polymer 
supports (Sarin et al.. 1988, Proc. Natl. Acad. Sd. U S A. 85:7448-7451), etc. 
20 While antiscnse nucleotides complementary to the target gene coding region 

sequence could be used, those complementary to the transcribed untranslated region are 
most preferred. 

A numl)cr of methods have been developed for dclh^cring antiscnse DNA or RNA to 
cells; e.g., amisense molecules can be injected dh^y into the tissue site, or modified 

25 antiscnse molecules, designed to target the desired cells (e.g., antisense linked to peptides or 
antibo^es that spedfically bind receptors or antigens expressed on the target cell surface) 
can be administered systemically. 

However, it is often difficult to achieve intracellular concentrations of the antiscnse 
suffident to suppress translation of endogenous mRNAs. Therefore a preferred approach 

30 utin2es a recomibinant DNA construct in which the antisense oligonucleotide is placed under 
the control of a strong pol III or pol H promoter. The use of such a construct to transfect 
target cells in the patient wiD result in the transcription of sufficient amounts of single 
stranded RNAs that will fonn complementary base pairs with the endogenous target gene 
transcripts and thereby prevent translation of the target gene mRNA. For example, a vector 

35 can be imroduced in vivo such that it is taken up by a cell and directs the transcription of an 
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antisense RNA. Such a vector can remain episomal or become chromosomany integrated, as 
long as it can be transcribed to produce the deared antisense RNA. Such vectors can be 
constnjrted by recombinant DNA technology methods standard in the art. Vectors can be 
plasmid, viral, or others known in the art. used for replication ajid e>q)ression in mammalian 

5 cells. Expression of the sequence encoding the antisense RNA can be by any promoter 
known in the art to act in mammalian, preferably human cells. Such promoters can be 
indudble or constitutive. Such promoters include but are not limited to: the S V40 eariy 
promoter region (Bemoist and Chanibon, 1981, Nature 290:304-3 10), the promoter 
contained in the 3 long tcnninal repeat of Rous sarcoma virus (Y amamoto et aL. 1980, CeD 

10 22:787-797), the herpes thymidine kinase promoter (Wagner el ai., 1981, Proc. Natl. Acad. 
Sci. U.S.A. 78:1441-1445). the regulatory sequences of the metaUothiouein gene (Brinster et 
al., 1982, Nature 296:39-42), etc. Any type of plasmid. cosmid. YAC or viral veaor can be 
used to prepare the recombinant DNA construct which can be introduced directly into the 
tissue site; e.g., atherosclerotic vascular tissue. Alternatively, ^^ral vectors can be used 

1 5 which selectively infect the desired tissue, in which case administration may be accomplished 
by another route (e.g., systcmically). 

Ribozymes are enzymatic RNA molecules capable of catalyzing the specific cleavage 
of RNA. The mechanism of ribozyme action involves sequence specific hybridization of the 
ribozyme molecule to complementary target RNA, followed by an endomicleolytic cleavage. 

20 Ribozyme molecules designed to catalytically cleave target gene mRN A transcripts can also 
be used to prevem translation of target gene mRNA and expression of target gene. (See, 
e.g., PCX International Publication WO90/1 1364. published October 4, 1990; Sarver et al., 
1990, Science 247: 1222-1223). While ribozymes that cleave mRNA at site specific 
recognition sequences can be used to destroy target gene mRNAs, the use of hammerhead 

25 ribo^roes is preferred. Hammerhead ribozymes cleave mRNAs at locations dictated by 
flanking regions thai form complementary base pairs with the target mRNA. The sole 
requirement is that the target mRNA have the following sequence of two bases: 5-UG-3*. 
The construction and production of hammerhead ribozymes is well known in the art and is 
described more fiiDy m Haseloff and Geriach. 1988, Nature, 334:585-591 . For example, 

30 there are hundreds of potential hammerhead ribozyme deavage sites withm the nucleotide 
sequence of rchd534 and fdid540 cDNA. Prefierably the ribozyme is en^necrcd so that the 
deavage rccognhion she is located near the 5' end of the target mRNA; le., to increase 
effi^ency and nuniroize the intraceOular accumuiadoa of non-fimctiona] mRNA transcripts. 
The ribozymes of the present invention also include RNA endoribonucleases 

3S (hereinafter "Cech-type ribozymes') such as the one which occurs naturally in Tctrahyraena 
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Thermophila (known aa thelVS. orL-19 IVS RNA) and which has been extensively 
described by Thomas Ccch and coUaboratora (Zaug, et a].. 1 984, Science. 224:574-578; 
Zaug and Cech, 1986. Science. 23 1:470-475; Zaug. et al., 1986, Nature, 324:429-433; 
published International patent application No. WO 88/04300 by University Patents Inc.; 
5 Been and Cech, 1986, Cell. 47:207-216). The Ccch-type ribo2ymes have an eight base pair 
active she which hybridizes to a target RNA sequence whereafter cleavage of the target 
RNA takes place. The invention encompasses those Cech-typc ribozymes v^ich target eight 
base-pah- active site sequences that are present in target gene. 

As in the antisense approach, the ribozymes can be composed of modified 
10 oligonucleotides (e g. fbr improved stability, targetmg. etc.) and should be delivered to ccUs 
A»Aich express the target gene in vhrt).e.g,, endothelial cells. A preferred method of delivety 
involves using a DNA constmct "encoding" the ribozyme under the control of a strong 
20 consututtve pol III or pol U promoter, so that transfected cells will produce sufficient 

quantities of the ribozyme to destroy endogenous target gene messages and inhibit 
1 5 translation. Because ribozymes. unlike antisense molecules, are catalytic, a lower 
intracellular concentration is required for efficiency, 
25 Nucleic acid moJecules to be used in triple hdfae formation for the inhibition of 

transcription should be smgle stranded and composed of dcoxyribonucleotides. The base 
composition of these oHgonucleotides must be designed to promote triple helbc formation via 
20 Hoogsteen base pairing niles. which generally require sizeable stretches of either purines or 
^ pyriraidincs to be present on one strand of a duplex. Nucleotide sequences may be 

pyrimidinft-based, which will result m TAT and CGC+ triplets across the three associated 
strands of the resulting triple helix. The pyrimidme-ricb molecules provide base 
complementarity to a purine-rich r^on of a single strand of the duplex in a parallel 
35 orientation to that strand. In addition, nucleic acid molecules may be chosen that arc 

purine-rich. for example, containing a Stretch of G residues. These molecules will form a 
triple helix with a DNA duplex that is rich in GC pans, in wWch the m^orily of the purine 
residues are located on a angle strand of the tarseted duplex, resulting in GGC triplets 
across the three strands in the triplex. 

30 Alternativdy, the potential sequences that can be targeted for triple helix formation 

may be mcreased by creating a so caHed -switchback" nudcic acid molecule. Switchback 
molecules are synthesized in an alteniating 5»-3; 3*.5' mamier, such that they base pair with 
first one strand of a duplex and then the other. cHminating the necesiaty for a sizeable stretch 
of eitficr purines or pyrimidines to be present on one strand of a duplex. 
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Target gene expression can also be reduced by inactivating or "knocking out' the 
target gene or Ks promoter using targeted homologous recombination. (E.g., see Smithies et 
al.. 1985, Nature 317:230-234; Hiomas & Capecchi, 1987, Cell 51:503-512; Thompson et 
al ., 1989 Cell 5:313-323; each of which is incorporated by reference herein in its entirety). 

5 For example, a mutant, npn-functional target (or a completely unrelated DN A sequence) 
flanked by DNA homologous to the endogenous target gene (either the coding rc^ons or 
regulatory regions of the target gene) can be used, with or without a seleaable marker 
and/or a negative selectable marker, to transftct cells that express target in vivo. Insertion of 
the DNA constmct, via targeted homologous recombination, resuhs m inactivatton of the 

10 target gene. 

Alternatively, endogenous target gene cxpresaon can be reduced by targeting 
deoxyiibonucleotide sequences con^lcmcntaiy to the regulatory region of the target gene 
O.e., the target pronmtcr and/or enhancers) to form triple heKcal structures that prevent 
transcription of the target gene in target cells in the body. (See generally, Helenc. C. 1 99 1 , 
15 Anticancer Drug Des., 6(6):569-84; Helene, C, et al. 1992, Ann, N.Y. Accad. Sci., 
660:27.36; and Maher, L.J., 1992. Bioassays 14(12):807-15). 

5.3. 1 .2 Disruption of Target Genes 
Endogenous target gene expression can also be reduced by inactivating or "knockipg 

20 out" the target gene or its promoter tising targeted homologous recombination. (E.g., sec 
Smithies et al.. 1985, Nature 3 1 7:230-234; Thomas & Capecchi, 1987, Cell 5 1 :503-5 12; 
Thompson et al.. 1989 Cell 5:3 13-321 ; each of which is incorporated by reference herein in 
its entirety). For example, a mutam, non-fimctional target (or a completely unrelated DNA 
sequence) flanked by DNA homologous to the endogenous tai:get gene (either the coding 

25 regions or regulatory regions of the target gene) can be used, vwth or without a selectable 
marker and/or a negative selectable marker, to transfect cells that express target in vivo. 
Insertion of the DNA construct, via targeted homologous recombination, results in 
inactivation of the target gene. Such approaches can be adapted for use in humans provided 
the recombinant DNA constructs are directly admmistered or targeted to the required site m 

30 vivo using appropriate viral vectors, e.g.. vectors for ddiveiy vascular tissue. 

An example of such an animal model is the apo-deficient mouse, which is an animal 
model for astherosderoas, in which the ap-E gene has been (fisrupted (Plump el al., 1992, 
Cell 701:343-353). Using the methods disclosed herdil, biological samples from animal 
modesi such as the apo-deficient mouse can be analyzed to define green islands correlated 

35 with an atherosderode disease state. 
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5.3.2 Perturbation Through Targe tgd Oene Expression 
The physiological state of the cdl can be perturbed by selectively regulating the 
expression of one or more genes. For example, a given gene can be genetically engineered 
so that its expression is controlled by a specialized promoter. Host cells can be transformed 

5 with the target gene controlled by ^propriate expression control elements (e.g., promoter, 
enhancer, sequences, transcription terminators, poK-adenylation shes, etc.), and a selectable 
marker. Following the introduction of the recombinant DNA construct, engineered cells may 
be allowed to grow for 1-2 days in an enriched media, and then are switched to a selective 
media. The selectable marker in the recombinant plasmid confers resistance to the selection 

10 and allows cells to stably integrate the plasmid into their chromosomes and grow to form 
fod which in turn can be doned and expanded into cell lines. This method may 
advantageously be used to en^neer cell lines which express the target gene in a controlled 
manner. Using these methods, the target gene can be engineered to be expressed under the 
control of a highly expressed promoter, for example. Such promoters may be constitutive. 

1 5 or regulatable such that high levds of expression can be induced upon addition of an 
inducing compound or other stimulus (e.g., temperature shift in the case of a temperature 
sensitive promoter). Alternatively, gene which is normally highly expressed in a given cell 
type and/or under certain physiological conditions can be selectively down-regulated by 
adding a factor known to negatively regulate the recombinant promoter. 

20 A number of selection systems may be used for introduction of the recombinant 

construct, including but not limited to the herpes simplex virus thymidine kinase (Wigler. et 
aL, 1977, Cell 1 1 :223), hypoxanthinc-guanine phosphoribosyltransferase (Szybalsld & 
Siybalski, 1962, Proc. Natl. Acad. Sci. USA 48:2026), and adenine 
phosphoribosyltransferase (Lowy, el al., 1980. Cell 22:817) genes can be employed in tk-, 

25 hgprt- or aprt- cells, respectively. Also, antimetabolite resistance can be used as the basis of 
selection for dhfr, which confers resistance to methotrexate (Wiglcr, et al., 1980, Katl. Acad. 
Sci, USA 77:3567; OHarc, et al., 1981, Proc. Natl Acad. Sci. USA 78:1527); gpt, which 
confers resistance to mycophenolic acid (Mulligan & Berg, 1981 , Proc. Natl. Acad. Sd. 
USA 78:2072); neo, which confers resistance to the aminoglycoside G-418 

30 {ColbciTe-Garapin,eta]., 1981, J.Mol.Bioi. 150:1); and hygro, which confers resistance to 
hygromydn (Santore, et al.. 1984. Gene 30:147) genes. 

5.4 Characterttati on of Perturbed of Genetic Networl^ 
Another embodiment of the present invention utilizes experimental perturbations or 
35 stimuli to modify the expression level of a predetenruned gene, followed by characterization 
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of the subsequent expression levels of a plurality of genes in the network to identify genes 
within the same green island. Damage; e.g. subsequent changes in the expression levels of 
one or more genes in a network ha\dng a gene which has been perturbed as compared to the 
expression IcjvcIs of corresponding genes in an unperturbed network, will be substantiaUy 

5 confined to genes in the same green island. A gene» or product within the perturbed copy 
may be defined as "damaged" if the expression level of the gene is ever different, one or 
more times, from the state or levd of activity of the corresponding variable in the 
unperturbed copy. Given this definition, a $itc can be different in its activities from the 
unperturbed site many times, but it is only damaged once and remains damaged thereafter. 

1 0 Thus, by characteriang the expression levels of genes within a biological sample that 

has not been perturbed and characteriang the expression levels of genes within a biolo^cal 
sample that has been perturbed and comparing the expres^on levels of corresponding genes 
in the perturbed and unperturbed sample one can identify which genes m the sample belong 
to the same green island. 

1 5 Perturbations can be achieved by a variety of means known in the art as described in 

Sections 5.3 above and 5.6, below, inchiding cloning an exogenous promoter or enhancer 
upstream fTX>m the gene in question, and increasing the activity of the gene via that promoter. 
RNA chip, protem gel etc analysis is then performed to determine whizh other genes or 
products change their activities following the penurbation. In addition to cloning upstream 

20 cis ates, irycction of complementary RNA which hybridizes to the mRNA of specifice gene, 
antisense, phage display peptides that bind the RNA in question or modulate the activity of 
cis sites, extra copies of cis sites injected into ceDs. small molecules that modulate the' 
activity of the gene or product in question, or any other perturbadon can be used to perturb 
one or more genes in a genetic network and to initiate avalanches of change within the 

25 network. 

As a non-limiting example, a preferred embodiment of the invention proceeds by 
characterizing the expression level of genes within a biological sample, altering or perturbmg 
the expression level of a single predetermined gene vwthin the biological sample; allowing the 
perturbed biological sample and an otherwise idemical unperturbed biological sample to 
30 evolve for a time suffident to allow the expression levrfs of at least some genes to change; 
characterizing the expression level of a phjrality of genes within the perturbed and 
m^fcrtuibed biological samples; and comparing the expres^on levels of corresponding genes 
wAin the perturbed and unperturbed samples to identify genes that experience a change of 
expression level or state in response to the perturbation. 
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The steps of altering or perturbing, aUowing each sample to evolve in time, 
charatteriang, and comparing the expression levels of genes within the samples may be 
repealed 

Charaaerizaiion of the expression level of genes in the sample or the aburidance of 
5 protdns in the sample may be carried out using any nsethods fbr charactcriang the 

expression level of genes or protwns including but not limited to nucleotide arrays, SAGE, 
or two dimensional protein gcls» as described above. 

It is possible that the network may exhibit more than one green island. In such cases, 
the presence of more than one green island may be ascertained and genes belonging to each 
10 green island identified. 

5.S Measures of Mutual Information 

If there is correlation between the activation states expression levels) of a 
number of genes, then the current or past expression level of one gene may directly or 

1 5 indirectly influence the currem or future expression level of one or more of the remaining 
genes. For example, if a protein expressed by a first gene inhibits the transcription or 
n^slation of a second gene then activation of the first gene hTcely reduces the level of 
exprcsMOn of the second gene. Thus, the activaion state of the second gene is correlated 
with the activation state of the first gene. Given this definition of correlation between genes, 

20 if the activation states of two or more genes are correlated and the regulatory rules 
corresponding to the genes are known, then knowledge of the state of one of the genes 
provides information regarding the past, current, or fixture states of the remaining genes. 
The activities of genes vnthin a given green island are generally correlated whereas the 
activities of genes fai different green islands are generally uncorrelated . Thus, genes 

25 identified as having correlated activities or levels of expression generally belong to the same 
green island whereas genes identified as having uncorrelated activities or levels of expression 
generally belong to different green islands. 

In general, however, the regulatory mlcs corresponding to tiie genes may not be 
known. One cnibodimfinl of tiic present invention, to characterize which genes arc in the 

30 same green island, is based on methods to measure correlations between changes in the 
expression level of genes within one island, and the lack of correlation between changes of 
eiqwession levels of genes whhin different islands. The regulatory rules corresponding to the 
expression of the genes need not be known to characterize the correlation between the 
expression lev^s of the genes. Witiiout limitation, one such measure of the correlation 

35 between the expression levels of any two genes is given by mutual information. The mutual 
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information, M, between the expression levels of any two genes, A and B, may be described 
by 

t PiAO]og{piAi))^t PiBms!j>(Bk))^i i piAiBk)]og(piAiBk)) 
S t-1 '-I *-i 

where p(Aj) is the probability that the cxprcsaon level of gene A is in the Lth slate or level of 

acti\ity, p(Bj) is the probability that tfie expression level of gene B is in the jth state or 

activity levd, and p(Ai,Bj) is the joint probability that the e)q)ression level of gene A is in the 

itb state while the expression level of gene B is in the jth state. The sum of terms 

1 0 p(Ai)logp( Ai) represents the entropy of the gene A and is evahiated over the m discriminable 
expression levels exKbfted by gene A The sum of terms pCBj)logp(Bj) represents the 
entropy of gene B and is evaluated over the n discriminable expression levels exhllHted by 
gene B. The sum of terms p(AiBj)logp(AiBj) represents the joint entropy of genes A and B. 
The joint entropy is evaluated over the ra discriminable expresaon levels states of gene A 

15 aiid then discriminable expression levels of gene B. Thus, the mutual information of the two 
genes is the sum of the entropy of gene A. H(A) «- p(Ai)logp(AjX phjs the entropy of gene 
B, H(B) = p(Bj)logp(Bj) nunus the joint entropy, H(AB) « p(AiBj)logp(AiBj). That is, M = 
H(A) + H(B)-H(AB). 

If the probabilities p(Ai)» p(Bj), and p(AiBj) are not known, the probability that a 

20 gene exhibits a given expresdon level may be replaced by the fraction of time the gene 
exhibits a given e3q>ressioit level. For example, in one embodimem of the present invention, 
the genes A and B corrcqpond to a pair of genes in a ^cn cell type or in different cell types. 
As an example, consider a set of genes modeled as a synchronous Boolean network on a 
single state cycle \wih 10 states vnthin the state cyde and wherein the activity of each gene 

25 may be described by one of two states. The state of the expression level of each gene may 
be defined as either on {e.g. aaive) or ofif (e.g. inactive). In this case, for example, the 
entropy of A is given by the sum of p(Ai)logp(Ai), v^iere the term p{Ai)logp(Ai) is evaluated 
over two cases: firrt where p(AI) is the fraction of time that gene A is "on" and, second, 
where p(A2) is the fraction of time that gene A is off The entropy of B is evaluated 

30 similarly based on the fraction oftime Bis either on or off. The joint entropy considers the 
fraction of time that genes A and B are simultaneously on, the fraction of time that genes A 
and B are sbnultaneously oS^ and the fraction of time that A is on >^en B is off and the 
fraction of time that A is off when B is on. 

The mutual information between any two variables is non-zero only when both 

35 variables odnbit correlated unfixed changes in state. For example, when the expression 
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levels of genes A and B are uncorrelatcd. then H(AB) = H(A) + H(B) and the mutua! 
infbnnadon is 0. Similarly, when the expression level of one of two genes is fixed, then its 
entropy is 0, and the joint entropy is equal to the entropy of the remainii^ unfixed gene. 
Thus, the mutual infonnation between two genes is also 0 when the expression levd of at 

5 least one of the two genes is fixed. 

For example, one can test whether two genes, A and B, on one attractor of a 
Boolean network are in the same isolated green island, or in different islands. To do so. a 
mutual information test is canied out between the cxprcsaon levels of all pairs of genes 
whose activities change wi^in or between attractors (or within or between cell types of real 

10 organisms). Numerical analyas using synchronous Boolean networks in the ordered rcgbne 
generated by matching the known distribution of canalyzing functions observed in regulated 
euksryoiic genes, shows that this lest suffices to distinguish most genes within the same 
green island from genes in different green islands, figure 3. Similarly analysis of mutual 
infonnation between the genes but where the analysis b taken over many or all the 

1 5 alternative attractors of the same network, again shows that this measure suffices to 
distinguish most genes that are in the same green island from those that are in different 
islands, figure 4. 

In one embodiment of the present invention, the corresponding characterization of 
the expression levds of genes within one or more cells, seta of cells or tissue samples 

20 includes cbaracteriiation of the expression level of genes within one or more biological 
samples such as one or more cells, sets of cells or tissues. If a gene (or protein abundance) 
exlubits a range of expression levels, defining or binning the observed abundances of each 
gene's transcription activity into two or more dlsainunable ranges allows mutual information 
measures to be constructed. Then, mutual information tests are carried out between the 

25 expression levels of all pairs of genes (or prottin abundances) whose activities change within 
or between cells, cell types or tissue samples to identify genes that substantially correspond 
to members of the same or different green islands. Mutual information tests may also be 
carried out between pairs of proteins whose levels or concentrations change between cells, 
cell ^es. or tissue samples. 

30 In another embodiment of the present invention, characterization of the expression 

levels of genes (or protein abundances) mUhm a one or more biological samples is repeated 
at least once to estahlt^ a temporal record of the expression levels of the genes or protehis 
states. Then, mutual information tests are carried out between all pairs of genes (or protein 
abundances) whose activities change over the temporal record to identify genes that are 

35 members of the same green island. 
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5.6 Characterizati^" of Pattcnis nene Kxpression Levels 
Another embodiment of the invention is based on the analysis of the level of gene 
expression from more than one alternative cell type of the same organism. As a non-limiting 
example of the algorithm, consider the hypothetical case of a genetic network with three 

5 isolated green islands, A. B and C» where A has two alternative steady state attractors, B has 
three ahcmativc steady state attractors, and C has four alternative steady state attractors. 
The attractors within each island represent substantially recurrent patterns of the e?q)ression 
levels of genes within each island tiiat generally occupy a sub-volume of all the space 
containing all possible states of the expression levels of genes within each island. The genes 

1 0 within each island can occupy one or more discrinunable levds of expression and each state 
or level of cxpresuon of a gene corresponds to one of the attractors exhibited by the island 
containing the gene. 

By way of non-limiting example, assume that A contains 5 genes, B contains 1 1 
genes, and C contains 21 genes. In this example, it is further assumed without limitation, 
1 5 that eadi expression level of each of the 37 genes can be characterized and/or discriminated 
uang the measurement approaches described above. 

Consider the total set of 5 + 1 1 21 = 37 genes associated with the green islands. 
The total number of alternative attractors assodated with the 37 variables of the network is 
given by the product of the number of atu^ctors of the three islands, or 2 x 3 x 4=24. Thus. 
20 the networic as a whole exhibits 24 attractors. vAadi correspond to a recurrent pattern of 
expression levels of the 37 network genes. As a comparison. Hydra has about 13 cdl types, 
e.g. attractors. and humans have about 265 cell types, e.g. attractors. 

For reference purposes, each gene may be aswgncd a number from 1 to 37. 
However, the number of a given gene is not a function of the island to which the gene 
25 belongs. Thus, a given gene may belong to any of the three islands. 

A preferred embodiment of the invention comprises of randomly specifying an initial 
ordering of the 37 genes associated with the 3 islands from among the 37! possible orderings 
of the genes. Without loss of generality, let the raitial ordering be ^c natural, numerical 
ordering 1,2,3, 37. Beginning vAth gene 1, the 37 genes may be grouped into 37 
30 successively larger sets of genes containing from 1 to 37 genes. For example, set 1 might 
include only gene 1, set 2 might inchide genes 1 and 2, set 3 might include genes 1-3, and set 
37 might include genes 1-37. 

Eadi set of genes has a given number of patterns w^er^n each pattern corresponds 
to a possible combination of the expression levels assodated with the genes contamed in the 
35 set The algorithm proceeds by identifying the number of patterns of expression levels 
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associated with each set of genes. If gene 1 happens lo be associated with island C. which 
has four attraaors. then, according to this example, variable 1 will exhibit four discriminably 
different activity levels, e.g. patterns. 

Next, the number of different patterns of activity assodated with the set of genes 1 

5 and 2 are identified. By way of example only, assume that gene 2 happens to be in isolated 
green island A, has two alternative attractors. Further assume that gene 2 has two 
discriminable aaivity levels associated with the two attractors. Then, because genes I and 2 
belong to different islands the total number of patterns exhibited by genes 1 and 2 will be 8 
because the activity of gene 2 may exlubit cither of two levels for each of the 4 activity levels 

10 of gene K 

Next, assume that gene 3 fies in green island C Thai, because genes 1 and 3 belong 
to the same island, the total number of patterns for a set containmg genta 1 ,2. and 3 wiU 
rBinain 8, corresponding to the four attTBctors of island C, in which genes 1 and 3 reside and 
exhibh a total of four patterns, corresponding to the steady state levels of both genes 1 and 3 
IS on the four different attractors, times the two patterns due to gene 2 in isolated island A with 
its two alternative attractors. 

Next, assume that gene 4 lies in isolated island B, an island with three attractors. 
Consequently, the activity level of gene 4 is substantially uncorrelated with the activity levels 
of genes 1-3. Thus, the set containing genes 1-4 wU exhibit 24 total patterns because gene 4 
20 may exhibit one of three activity levels for each of the «ght patterns produced by genes 1-3. 

Subsequently, repeating tiic analyas for sets containing genes 1-5, 1-6, 1-37 will 
reveal no new patterns of gene activity. 

Completing the algorithm for only one ordering among the 37! possible ordcrings of 
the genes indicates that there are three isolated islands, one with two attractors, one with 
25 three attractors. and one with four attractors. Further, the result indicates that we have data 
for each of tiie 37 genes that it alone exhibits 2, 3, or 4 patterns. 

Thus, at the end of this single 37 gene analy^s, we know for each gene which island 
it is in, and the number of attractors of that island. 

For further confirmation, the algorithm proceeds as follows First, the 37 genes may 
30 be grouped into a (Cerent ordering of sets and the analysis repeated. The same spectrum of 
increases of observed patterns should be observed among the 24 depending upon which 
order genes fiom tiie three islands are sampled. In all cases, a doubling of total patterns, a 
triplmg of total patterns, or a quadrupling of total patterns should be observed. 

Further, there may be many islands with the same number of ahemtative attractors. 
35 For example, let C and D be two such islands with four attractors each resulting in a new 
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25 



^ system with islands having 2 x 3 x 4 x 4= 64 loial patterns. The same analysts may 

be carried out for, say, the natural ordering of genes 1 .2....37 to reveal which genes are m 
which island. Each green gene may be classified as to as to how many patterns it alone 
exhibits. Then a histogram may be produced displaying which genes exhibit 2 patterns. 3 

^0 5 patterns. 4 pattertis and so forth. We then paiiwise test genes within each such class, say the 

2 pattern class, to confirm if jointly they show 2 or 4 total patterns among the 64. This 
provides a second test to teU if the two genes are in the same island - they joinUy exhibit only 
2 patterns- or if the two genes are in tw islands - they joindy exhibit 4 patterns. 

15 Clearly, the analysis of activity level of genes or protdns associated with the 24 or 64 

10 attractors also reveals the fixed red genes that exhibit a wngle fixed activity on all attractors. 
MeanwhUe. given a random ordering among the. here 37 green genes on each of 
many assays of the X Y analysis of total pauems seen as the ordered set of genes increases. 

20 gives an estimate of the mamber of genes in each dass of "island attractors" - the islands with 

2 attractors, the islands with 3 attractors. the islands with 4 attractors. etc. This estimate is 
15 based on how often ajump in total number of patterns by a doubling, tripUDg.or quadnipling 
is found. 

If the total number of "green genes" is large, say 20.000 to 40,000 for a human. A 
similar analysis for a few hundred random ordcrings .among the 20,000 to 40.000 for the first 
ten to twenty genes in each ordering analyzed across the 265 or so cell types of a human 
20 should suffice to characterize most or aU of the different green islands in the human genome. 
More precisely, ^vcn hypotheses about the number and size distribution of such islands, the 
^ number of ordered sets that must be sampled to ensure that aU islands have been sampled at 

least by one gene in one ordering among the 20,000 to 40.000 green genes can be calculated. 
From this data, most of the green islands can be recovered al reasonably analysis. Further 
2S analyas of which of the remaining 20,000 to 40.000 are in each island can be carried out,as 
above, or by damage analysis or mutual information analysis. 

Once a set of genes is shown to be in the same green island, mutual information can 
also be used to discriminate which genes are direa and which are not direct inputs to one 
another. For example, if A is an input to B and B is an input to C, but A is not an input to C. 
40 30 then the mutual information about C given A cannot be greater than the amount ^vra by B. 

The imlure of inchision of data about A to increase mutual information about C establishes 
that A is not a direct input to C. while the presence of mutual information between A and C, 
summed over aU states of B, establishes that A influences C. Jointly the two in(fccatc an 
45 imHrca connection between A and C. If mutual information is obtained from a temporal 

35 scries of characterizations of the activations states of genes or protdn states then vi^thcr A 
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^ influences *C or C influences A can be discriminated by calculatine tmmiai information for A 

and C for pairs of limes whh the state of A before C, versus pairs of times with the state of A 
after C. 

It will be dear to those skiDed in the art. that these procedures generalize to.cascs 
iO 5 v/bere the behaviors of "green genes" on each attractor are not steady stale behaviors, but 

have more complex time signatures, so long as any unique and discriminablc signature can be 
asagned to each gene for each of the alicmaUve attractors of the green island of which it is a 
member. In the sraplcsl case, that signature might be the average over Oie attractor Thus, a 
15 population of the same cell type distributed randomly around the attractor state cycle or orbit 

10 could yield a fine dificrenl "average signature" for a given gene for each of the different 
attractor* of the green island in wKch that gene re»des. 

In the case of tissues with more than one cell type in the RNA chip snapshot, 
dcconvolution methods based on maximum likelihood esthnates are necessary. In these 
cases, any histdo^cal data or other data that gives evidence of the number, fractions, and 
15 types of cells in the tissue sample are help&l 

It is possible that cells in an orgamsm have but a single "green" island. Indeed. ccUs 
might acttially be in the chaotic regime, vwth a single percolating green sea. If so, the above 
methods to discover the number of green islands and the number of alternative attractors per 
island will discover this fact 
20 Further tests that cells are in the ordered versus chaotic regimes can be based on 

cxperimcmal characterization of derrida curves by analyas of the time patterns of activities 
of control and perturbed cell populations in which random subsets of 1 . 2, or many genes or 
their products arc tranacndy perturbed. The number of genes perturbed corresponds to the 
initial distance between the state of the perturbed and unperturbed cell or cell population. 
25 TWs can be afRrmcd fay KNA or protein or both snapshots. The later distance betv^reen their 
states at the RNA or protein levels at short and longer time intervals establishes the derrida 
curve convergence or divergence for each initial distance and time difference between that 
pair of states. By averaging over different pairs of states at tiie same initial distance, the 
average convergcoce or divergence in state space can be sampled. This can be achieved by 
30 perturbing the same set of genes for diflfercnt cell types of the same organsim taken at 
di£ferem stages of development and pathogenesis. 

5.7 Computer Systems 

FIG. 5 ^scloscs a representative computer system 8 1 0 in conjuncUon with which the 
35 embodiments of the prescot faivcntion may be tnrtplemented. Computer system 810 may be a 
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5 personal computer, workstation, or a larger system such as a minicomputer. However, one 

skilled in the art of computer systems will understand that the present im^cntion is not limited 
to a particular class or model of computer. 

As shown in FIG. 5. representative computer system 810 includes a central processmg 
10 5 unit (CPU) 812. amemory umt 814, one or more storage devices 816, an input device 818. an 

output device 820. and communication interface 822. A system bus 824 is provided for 
communications between these elements. Computer system 810 may additionally function 
through use of an operating system such as Windows. DOS. or UNIX. However, one skDled 
in the art of computer systems will understand that the present invention is not limited to a 
10 particular operating system. 

Storage devices 816 may iUustrativdy inch)de one or more floppy or hard disk drives. 
CD-ROMs, DVDs, or tapes. Input device 818 comprises a keyboard, mouse, microphone, or 
Other amilar device. Output device 820 is a con^wter momtor or any other known computer 
output device. Communication interface 822 may be a modem, a network interface, or other 
15 connection to external electronic dewces. such as a serial or paraUel port. 

Exemplary configurations of the representative computer system 810 indude 
cUem-scrver architectures, parallel computing, distributed computing, the Internet, etc. 
However, one skilled in the art of computer systems will understand that the present invention 
b not limited to a particular configuration. 
20 While the above invention has been described with reference to certain preferred 

embodiments, the scope of the piwent invention is not limited to these embodiments. One 
skilled in the art may find variations of these preferred embodiments which, nevertheless, fall 
within the spirit of the present invention, whose scope is defined by the claims set forth below. 

25 6. EXAMPLE: CANDIDATE PROKARYOTIC GENETIC REGULATORY 
35 NETWORK ^ 

The following example of a candidate genetic regulatory network is provided for 
purposes of iUustration, and not iinutation. The principles can be readily appHed using materials 
and methods well known in tiie art to other biological systems, inchiding, hut not Ihnited to, 
40 30 eukaryotic cell cultures, tissue samples, and organisms, including transgenic animals. 

A candidate genetic regulatory network for analyas in accordance with the invention is 
described in Babitzke et al., 1992, Journal of Bacteriology 174: 2059-2064, which is hereby 
mcorporated by reference in its entirety. This reference describes a set of genes in the 
45 prokaryotic orBanism Bacillus subUUs wWch are invohrcd in aromatic airdno acid biosynthews* 

35 and arc regulated by tryptoiAan. Thus, the physiological state of a culuire of Bac///u5 5J/6////J 
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can be analyzed by measuring the expression level of the members of this candidate genetic 
regulatory network under a given physiological state. For example, the culture can be grown 
in the absence of tryptophan. The expression levels of mRNA in the cells can be analyzed by 
harvesting mRNA and hybridizing the niRNA to a nucleotide array comprising appropriate 

5 nucleotide sequences as described m Sccdon 5 .2 above, using methods described in U. S. Patem 
No. 5,837,832. These expression levels can then be compared to cukure Bacillus subtilis 
growil in the presence of abundant tryptophan. Inchiaon of nucleotide sequences in the 
nucleotide array corresponding to the genes identified in Figure 3 ofBabhzkc ct al. can be used 
to confirm the network relationships and regulatory effects of the genes shown therein. In 

10 addiUon, inclusion of a vast plurality of other nucleotide sequences afforded by the use of 
nudeoudearray chips can bemed to identify otherpotemialmcmbersofthc genetic regulatory 

network. 

Hybridization »gnals that spedfically change intensity under one condition (c.^,, + 
tfyptophan)repre$emaspedficchangeinexpressionasaresdtofthcphyslologicdpcr^ 

15 ofaddingtryptophan. The genes whose expression level changes as a result of the tryptophan- 
induced perturbation are designated as damaged. The number of damaged genes is then 
analyzed using the numerical models for damage in a perturbed genetic regulatory network 
described in Section 5.5-5.7 above, to identify the green island that contains the genes whose 
expression is affected by tryptophan. This green island, and the expression levels of its 

20 individual member genes, provides snapshots of the physiological state of the Bacilivs subtilJs 
cefls that are charaaeristic of cither the presence or absence of tryptophan. Furthermore, 
individual members of this green island can then be identified by analyzing the sequences that 
are differentially expressed in response to the addition of tryptophan. 

25 The present invention is not to be limited in scope by the specific embodiments described 

herein, which are intended as single illustrations of individual aspects of the invention, and 
functionally equivalent methods and components are vwthin the scope of the invention. Indeed, 
various modifications of the invention, m addition to those shown and described herdn will 
become apparent to those skilled in the art fi-ora the foregoing description and accompanying 

30 dravrings. Such modifications are intended to fell within the scope of the appended claims. 

Various references including patent applications^ patents, and other publications, are 
cited herein, the disclosures of v^ch are incorporated by reference in their entireties. 
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W] ff ATTS riAIMEPIS : 

1 . A method for partitiomng a phiraUty of genes into one or more groups 

comprising the steps ofi 
5 selecting a first one of said genes and a second one of said genes; 

measuring a d^ee of correlation between said first gene and said second 

gene; and 

assigning said first gene and said second gene into a same one of said groups 
if said degree of correlation exceeds a predetermined threshold. 

10 

2. A method for partitioning a phirality of genes as in claim 1 finther compriang 
the step of repeating said selecting a first one and a second one of said genes step, said 
measuring a d^rec of correlation step and said assigning step for one or more pairs of said 
plurality of genes. 

15 

3. A method for partitioimig a phirality of genes as in claim 1 wherdn said 
measuring a degree of correlation st^ comprises the steps of: 

defining a slate for cadi of said phirality of genes; 
observing said state of said first gene and said second gene; and 
20 computing said degree of corrdatlon of said state of said first gene and said 

state of said second gaio. 

4. A method fbr partitioning a phirality of genes as in daim 3 wherein said 
degree of correlation represents a mutual information MI, between said first gene and said 

25 second gene. 

5. A method for partiuoning a plurality ofgenfis as in daim 4 wherein said 
mutual information is defined as: 

Ml-H(A) + H(B)-H{AB) 
30 wherein, 

A rq^resents said first gene, 

B rq>resents said second genes, 

H{A) represents an entropy of said first gene. 

H(B) represents an entropy of said second gene, and 
35 H(AB) represents a joint entropy of said first gene and said second gene. 
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6. A method for partiiioning a plurality of genes as in claim 5 wherein said 
entropy of said gene is defined as: 



5 ' 



wherein, 

i represents said state of said gene, 

p(i) represents a probability that said gene i& in said state 

log represents a logarithm operation, 

10 



^ represents a summation over posstbie onex o/satd states of sold 
» 

7. A method for partitioning a phiraKty of genes as in claim 1 wherein a 
1 5 Boolean variable represents said state of each of said genes. 



8. A method for partitionirig a plurality of genes as in claim 7 wherein 
said Boolean variable has a vabe of one if said gen is on and has a vahie of zero if said gene 

20 isofit 

9. A method for partitioning a plurality of genes as in claim 3 fiirtber 
comprising the preliminary step of identifying one or more of said phiraJity of genes that have 
a changing state. 

25 

10. A method for pamtionmg a phirality of genes as in claim 9 wherein 
s^d first one and said second one of said genes are selected from said identified one or more 
of said plurality of genes. 

30 u. A method for partttiomng 8 plurality ofgenes as in claim 1 wherem a 

multi-vahjcd variable represents said state of each of said genes, said nmlri-valucd variable 
mcasurif^ an activity of said gene. 

1 2. A system for partitioning a plurality of genes into one or more groups 

35 comprising: 
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8 programmed computer comprising a memory having at least one region 
storing computer executable program code and a processor for executing the program code 
stored in said memory, wherein the program code inchides; 

code to select a first one of said genes and a second one of said genes; 
5 code to measure a degree of correlation between said first gene and said 

second gene; and 

code to assign said first gene and said second gene into a same one of said 
groups if said degree of correlation exceeds a predetermined threshold. 

10 1 3. A system for partitiomng a pluralhy of genes into one or more groups 

as in claim 1 2 wherein the program code fiirther includes: 

code to define a state for each of said plurality of genes. 

1 4. A system for partitioning a plurality of genes into one or more groups 
15 as in claim 1 3 fiirther comprising a RNA chip for observing said stale of said first gene and 
said second gene. 



15. A system for partitioning a phirality of genes into one or more groups 
as in claim 14 wher^ the program code fiirther includes: 
20 code to receive said state of said first gene and said second gene from said 

3Q RMAdup;and 

code to compute said degree of correlation of said state of said first gene and 
said state of said second gene. 



25 1 6. A method for partitioning a plurality of genes into one or more groups 

comprising the steps of: 

defining a state fi)r each of said genes; 
selecting at least one of said genes; 

initiating a perturbation on said selected gene to change said state of said 

30 sdeaed gene; 

identifying zero or more of said genes that experience a change in said state in 
response to said perturbation. 

1 7. A method for partitioning a plurality of genes as in claim 1 6 fiirther 
35 comprising the step of repeating said selecting at least one of said genes step, said initiating 
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8 perturbation step and said identifying zero or more of said genes that experience a change 
step. 

18. A method for paititiomng a phirality of genes as in claim 1 6 wherein 
5 said initiating a perturbation step comprises the steps of: 

donins at least one exogenous promoter that is upstream from said selected 
gen^ and 

turmng said selected gene on via said cloned exogenous promoter. 

1 9. A method for partitioiimg a plurality of genes as b claim 16 whcrcb 
said initiating a pertuibadon step comprises the step of clonirjg at least one enhancer thai is 
upstream &om smd selected gene. 

20. A method fbr partitioning a plurality of genes into one or more groups 
compriang the steps of: 

observing a state of said genes; 
assignmg at least one of said genes to a set; and 
identifyit^ a number of patterns of said state of said genes in said set . 

21 . A method for partitioning a plurality of genes as in claim 20 further 
20 comprisbg the steps of: 

assigning at least a second of said genes to said set; 

identi^ing a number of patterns of said state of said genes in said set 

22. A method according to claim 21 fbnher comprising the step of 
25 assigning a multi-vahied variable to represent said state of each gene. 

23. A method according to clahn 22 i^erem said patterns represent 
combinations of said multi-vahied variable. 

30 24. AsystemforpartrtioningapluraHty of genes into one or more groups 

compridng: 

a programmed computer comprising a memory having at least one region 
storing executable program code and a processor for executing the program code stored in 
said memory* wherein the program code inchides: 
3 5 code to assign at least one of said genes to a set; 
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code to identify a number of patterns of said state of said genes in said set. 

25. A system for partitioning a plurality of genes into one or more groups 
as in daim 24 wherein the program code fiirthcr includes; 

5 code to asMgn at least a second of sad genes to said set; 

code to identiiy a number of patterns of swd state of said genes in wud $«. 

26. A method of deterajining characteristics of a phuality of genes 
comprising the steps of: 

10 partitioning said plurality of genes mto one or more groups; 

defimng a state for each of ssud groups; and 

determining the number of steady state values of said state of said groups. 
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